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Abstract 

The nonparametric estimation of the volatility and the drift coefficient of a scalar diffusion 
is studied when the process is observed at random time points. The constructed estimator 
generalizes the spectral method by Gobet, Hoffmann and Reifi [Ann. Statist. 32 (2006), 
2223-2253]. The estimation procedure is optimal in the minimax sense and adaptive with 
respect to the sampling time distribution and the regularity of the coefficients. The proofs 
are based on the eigenvalue problem for the generalized transition operator. The finite sample 
performance is illustrated in a numerical example. 
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1 Introduction 

For decades diffusion models are used to describe the dynamics of continuous stochastic processes, 
for instance, stock prices in econometrics or particle movements in biology and physics. The 
statistical properties of diffusion models depend essentially on the observation scheme, where it is 
natural to assume discrete observations of the process. Mostly, equidistant observations are studied 
in the literature, distinguishing between high-frequent and low-frequent observations, depending 
whether the observation distance tends to zero or remains fixed. A summary of parametric methods 
is given by Ait-Sahalia Q. Nonparametric estimation methods are surveyed by Fan fl3| . 

As argued by Ait-Sahalia and Mykland Q, assuming equidistant observations might however 
not be realistic in many applications and random sampling times should be instead considered. For 
parametric estimation problems Ait-Sahalia and Mykland [3, i.dj have shown that random sampling 
has a strong effect on the statistical problem and the performance of estimators. Naturally, the 
question arises how nonparametric estimators can be constructed for random sampling times and 
whether their (asymptotic) behavior is similar or worse than for equidistant observations. 

In order to study the nonparametric estimation of the drift and the volatility coefficient of the 
diffusion when the process is observed at random times, we generalize the low-frequency results by 
Gobet et al. [3|. As they do, we consider a reflected scalar diffusion on a one-dimensional interval. 
On the one hand, this allows to avoid technical difficulties and to present more transparent proofs 
when investigating spectral properties of the transition semigroup. On the other hand, diffusions 
with reflecting barriers have rich applications. In the finance and economics literature reflected 
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diffusions are used for currency exchange rate target-zone models, in which the exchange rate is 
allowed to float within two barriers enforced by the monetary authority c.f. (il ITbl. liflj] . Reflected 
diffusions also appear as the payoff of the so-called “Russian Options”, c.f. Shepp and Shiryaev 
Ha. Among applications in mathematical biology, we recall models for population dynamics in 
which the total number of individuals is affected by oppositely acting forces, e.g., spontaneous 
gro wth and immigration on the one hand and random harvesting or predation on the other, c.f. 
[25|]. Finally reflected Brownian motion have been shown to describe queueing models experiencing 
heavy traffic, see 16, 1?} . In all these models the observation times might not be equidistantly 


distributed. For instance, they depend on trading times for finance applications or measurement 
times of the biologist. 

By the compactness of the interval and the reflecting boundary, the diffusion is ergodic and 
admits a spectral gap. Our procedure relies on a representation of the coefficients in terms of the 
invariant measure and the first non-trivial eigenpair of the infinitesimal generator of the diffusion. 
This spectral identification method was introduced in Hansen et al. [151 ] and has been further 
studied by flo| . It is crucial that the eigenpair is determined by the transition operator of the time 
changed diffusion, where the time change is given by the sampling distribution and the Laplace 
transform of the sampling distribution. The former can be estimated by a wavelet projection 


le proc 

i. Q. 


roduct of our analysis we 
In particular, in order 


method and latter by classical empirical process theory. As a side 
clarify some aspects of the estimator and the proofs by Gobet et al. 
to stabilize the estimator against large stochastic errors a truncation with an in practice unknown 
threshold value was needed, which we could omit. 

Moreover, we show that Lepski’s method can be applied to chose the projection level in a data- 
driven way. This allows to adapt on the unknown Sobolev regularity of the drift and volatility 
coefficients of the diffusion. The first adaptive estimator based on low-frequency observations 
of a diffusion process has been constructed only recently in Sohl and Trabs [§i|. Considering 
diffusion on the whole real line, this first result is restricted to a diffusion with constant volatility, 
simplifying the whole estimation problem, we do not need any additional restrictions on the drift 
or the volatility. 

We prove that the estimators achieve minimax optimal convergence rates. The adaptive esti¬ 
mator only loses a logarithmic factor. In view of the cost of randomness determined by A'it-Sahalia 
and Mykland Q, it might be surprising that the convergence rates do not depend on the sampling 
distribution and coincide in fact with the nonparametric rates of the low-frequency setting. In 
that sense, our method is also adaptive with respect to the unknown sampling distribution. As one 
can see clearly from simulations, there is, however, a large cost of ignoring the randomness in the 
misspecified case where one applies the low-frequency estimator to randomly sampled observations 
using the average time step as observations distance. 

The paper is organized as follows: In Section [2] we introduce the diffusion with reflected 
boundaries, our basic assumptions and the main properties of the process. The estimators are 
constructed in Section [5] The main results on the convergence rates are stated and discussed in 
Section |U The adaptive estimator is constructed in Section [5j The finite sample performance of 
the method is illustrated in a small simulation study in Section [U] The proofs of the upper and 
lower bounds as well as for the Lepski method are postponed to Sections 0 [5] and [HI respectively. 
Finally, some results on the stability of the eigenvalue problems are presented in the appendix. 


2 The model 

Without loss of generality we can consider the unit interval [0,1] for the reflecting diffusion. 
For a measurable and bounded drift function b: [0,1] —> R and a continuous volatility function 
a : [0,1] —)• R + let the process X = {X t : t > 0} be given by the stochastic differential equation 

dX t = b(X t )dt + a{X t )dW t +v(X t )dY t (X), (1) 

X 0 = xq, and for alH > 0 X t £ [0,1], 
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where xq is a random variable on [0,1], W = {Wt : t > 0} is a standard Brownian motion, 
v: [ 0 , 1 ] —> R. satisfies u( 0 ) = l,t;(l) = — 1 , andF, which is part of the solution, is a non-anticipative 
continuous non-decreasing process increasing only when X t £ {0,1}. By the Engelbert-Schmidt 
theorem boundedness of the drift coefficient together with the volatility function be ing continuous 
and strictly positive ensure that © has a weak solution, see Rozkosz and Slominski |26|, Thm. 4.1]. 
We denote by P CT> & the law of this solution on the canonical space fl = C(M+, [0,1]) of continuous 
functions equipped with the topology of uniform convergence on compact subsets and endowed 
with its Borel a —field T. 

For TV £ N our observations are given by 

( 0 , Xo), (ti,X T i ), ..., (tn,X tn ) £ [ 0 , oo) x [ 0 , 1 ] 

where t\ ,..., tn is an increasing sequence of random time points. For convenience we write To = 0. 
Assumption 1 . Let the observation distances 

A n .— T n T n — 1, 71 — 1, . . . , A, 

be an independent and identically distributed sequence of strictly positive random variables with 
law 

7 £ r := F(J, a) := {7 probability measure on R+ : 7 (I) > a} 

for some compact interval I C (0, 00 ) and some a £ (0,1]. Let A„ be independent of the diffusion 
process X. 

This condition on the sampling distributions is very weak. For every given positive distribution 
7 there are /, a such that 7 £ T(I, a). The only restrictions are that the set T has to be bounded in 
the right sense, since we will derive uniform rates in this class, and we have to exclude distributions 
that concentrate at zero. The latter condition is natural because otherwise the observations would 
be of high-frequency type which would require a completely different analysis. 

Example 2. 

(i) The special case of the low-frequency observations is covered by setting r n = nA for some 
fixed deterministic A > 0. Then the sampling distribution is given by the Dirac measure in 
A, that is F = {<5 a}- 

(ii) If the observation times are governed by a Poisson process, the waiting time to the next 
observation is exponentially distributed, that is 7 = Exp( A) for some intensity A > 0. In 
this case we can choose T = {Exp(X) : A £ A} for any bounded set A C (0, 00 ). 

To state the assumptions on the diffusion coefficients, we denote the L 2 {[ 0,1]) Sobolev space of 
order s > 0 by H s := H s ([ 0,1]). Furthermore, let H§ C H s be the subset of bounded functions 
with Sobolev regularity s. Note that Hf = H s for s > 1/2 by the Sobolev embeddings. 

Assumption 3. For s > 1 and constants d, D > 0 let (a, b ) £ Q s where 

Q s := Q,(d,D) = {(a,6) £ H s x H^ 1 : ||a 2 |U» < D, ||6|| ff .-i < D, inf a(x) > d} . 

In particular, (a, b) £ © s ensures the existence of a weak solution of ©. As shown by Gobet 
et al. |14[ the compactness of [0,1] and the reflecting boundary conditions imply that X has a 
spectral gap and thus it is geometrically ergodic and admits an invariant measure p. Focusing 
on asymptotic results, we can suppose that the initial value xq is distributed according to p. 
Assumption [3] implies that p has the Lebesgue density, abusing notation denoted by p as well, 

p(x) := p a ,b(x) = C 0 a~ 2 {x) exp { J 2b(x)a~ 2 (y) dyj, x£ [ 0 , 1 ], ( 2 ) 
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for some normalizing constant Co > 0, cf. Bass Chap. 4] or Karlin and Taylor u Chap. 15, 
Sect. 6]. It is easy to see that the regularity assumptions on b and a imply that p £ H s , which will 
be essential for the analysis of the estimators. From the explicit formula for [i moreover follows 
that there are constants 0 < c < C such that c < < C for any (cr,b) £ 0 S . Consequently, 

L 2 (n) with the inner product 



f(x)g(x)fi(x)dx 


is a Hilbert space equivalent to L 2 ([0,1]). 

Noting that reflection corresponds to Neumann boundary conditions, the infinitesimal genera¬ 
tor L = L a b of the diffusion X is an unbounded, densely defined operator on L 2 ([ 0,1]) satisfying 


Lf(x) = b(x)f'{x) + ^cr 2 (x)/"0), 
dom(L) = {/£lF 2 ([0,l]):/'(0 ) = /'(l) = 0}. 

Furthermore, seen as an operator on the Hilbert space L 2 (a ), the generator L is an elliptic, 
self-adjoint operator with compact resolvent, see Chatelin [9|, Example 4.21]. Consequently it 
has a pure point spectrum cr(L) = {v k ■ k = 0,1,...} and the corresponding eigenfunctions u k 
form an L 2 (p) orthogonal basis. Its largest eigenvalue Vo equals 0 with constant corresponding 
eigenfunction. All other eigenvalues are negative and we assume that they are ordered with respect 
to their multiplicities 0 > v\ > vi > ... . As shown in [ 13 . Lemma 6.1], the eigenvalue v\ is simple 
and the eigenfunction u\ can be chosen strictly increasing. 


3 Estimation method 


3.1 Spectral identification 


The main idea used for the construction of the spectral estimators in 14| is that the coefficients 
of a stationary diffusion process can be expressed in terms of the invariant density /i and any 
nontrivial eigenpair (vk,Uk), k > 1. Indeed, expressing the invariant measure in terms of the 
speed measure together with the Neumann boundary conditions yields, cf. m Sect. 3.1], 


_ %vk fp u k (y)iJ.{y)dy 
u' k {x)fJ,(x) 

_ VkUkjx ) _ a 2 (x)u'^{x) 

K( x ) 2u k( x ) 

_ u k {x)u' k {x)n{x) - u'l(x) f 0 x U k (y)n(y)dy 
Vk u' k (x) 2 ii(x) 


( 3 ) 

( 4 ) 


Applying the ergodicity, it is easy to estimate the invariant measure n. To recover an eigenpair 


of the generator, Gobet et al. 14] have used discrete equidistant observations, i.e. A n = A for 


some fixed A > 0, to construct a matrix estimator of the transition operator P/y = e AL . Noting 
that Pa shares eigenfunctions with the generator L while its eigenvalues are e Avk , k = 0,1,..., 
they have obtained estimators of (• v k ,Uk )• We will generalize these results taking into account the 
random observation times ti, ..., rjv- 

Similar to the transition operator Pa we introduce the generalized transition operator R on 
L 2 (fj,) given by 


Rf{x) = E CT , b , 7 [f(X T )\X 0 = x], x £ [0,1], 


( 5 ) 
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where r is a random variable with distribution 7 being independent of the process A'. The crucial 
insight is that for any eigenpair ( v k ,u k ) of the generator we have 

Ruk{x) = E CT , b)7 [E CTjbj7 [P t u fc |r = t]] = E 7 [e TVk ] u k {x) = C~,(-v k ) ■ u k (x), (6) 

where 

/*oo 

£ 7 (z) := / e _tz 7 (dt), 2 £l + , (7) 

Jo 

is the Laplace transform of 7 . Consequently, R is a compact operator with eigenvalues 1 = kq > 
«i > «2 > K 3 > ... > 0. In the functional calculus sense we obtain 


R = C 1 (-L). 


Therefore, we can estimate the eigenpairs (v k ,u k ) using the spectral properties of R. Since 
the sampling distribution 7 is unknown, we need to estimate the Laplace transform from the 
observations (A n ) n =i t ... t N. 

Example [2] (continued). (i) For A„ = A for some fixed, A > 0 we have Rf = P&f and 
C 1 {z) = e -Az ,z > 0. We thus exactly recover the situation studied in fl2 l. 

(ii) If A n ~ Exp(X), then the Laplace transform is given by £ 7 (z) = / 0 °° Xe~^ z+x l dt = ^A_, 2 > 
0 and the operator R is the resolvent of the generator L. 

The distribution of the eigenvalues of the operator R is inherited from the generator L and the 
sampling distribution 7 . More precisely, we obtain the following lemma whose proof is postponed 
to Section P 

Lemma 4. Grant Assumptions^ and\^ The spectral gap, that is inf^i \m — Ki\, and the eigen¬ 
values of the generalized transition operator R have a lower bound uniform in (cr, b) £ © s and 

7 s r. 


3.2 Construction of the estimators 


Let us fix some notation. We will write / < g (resp. g > /) when / < C ■ g for some universal 
constant C > 0. / ~ g is equivalent to / < g and g < /. Let (ipx), with multi-indices A = (j, k), be 
an L 2 — orthonormal regular wavelet basis of L 2 ([0,1]). The corresponding approximation spaces 
are given by 

Vj :=span{V’A : |A| = \(j,k)\ := j <J}. 

The L 2 — orthogonal and the L 2 (p)— orthogonal projections onto Vj are denoted by nj and nj, 
respectively. 

In fact, the approximation spaces do not necessarily need to be generated by wavelets. We 
only require that Vj. J £ N, satisfy Jackson and Bernstein type inequalities with respect to the 
Sobolev spaces H s , that is for all 0 < t < s, / £ H s and g £ Vj 

\\{I--Kj)f\\ Ht <2~ J ^\\f\\ H s and \\g\\ H i<2 Jj \\9\\L*, j = 1,2, ( 8 ) 


and additionally we need the uniform bound 


|A|<J 


< dim(Vj) = 2 J . 

OO 


(9) 


It follows from the well known properties of wavelets that ((SJ) and © are satisfied. 
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Remark 5. Since the eigenfunctions of the generator of the reflected Brownian motion are given 
by the trigonometric functions, it seams to be attractive to choose Vj as the closure of the span of 
the first 2 J orthogonal trigonometric basis functions, which however does not fulfill ([8|) . If the drift 
and the volatility function satisfy the stronger Holder regularity assumption ||cr 2 ||c s , IHIc*- 1 < D, 
where || • ||c« denotes the Holder norm, then we can obtain the same bounds on the mean L 2 
estimation error under a weaker version of Jackson’s inequality, namely 

ll(/-^)/IU3<2- 7 i/|| C s. 


This inequality is satisfied for the trigonometric basis. Furthermore Bernstein’s inequality can be 
easily checked and © is trivially fulfilled. The same applies to the B^pline basis, that satisfies 
above conditions with the weakened Jackson inequality (see mt and fl2|). 

After having fixed the basis functions and the corresponding approximation spaces Vj, there is 
a one-to-one correspondence between a linear operator A: Vj —> Vj on the finite dimensional space 
Vj and its matrix representation (Aa,a') G R dlm V/xdimVj w ith A\ t y := (ip\, Aipy). To simplify 
the notation, we will throughout use A to denote the operator as well as its representation matrix. 

Using the ergodicity of the diffusion X and the independence of X and (A„) n , the sequence 
( X Tri ) n is ergodic, too. The natural estimator for the invariant measure is therefore the empirical 
measure 

1 N 

m = Jr + i ^ 5xr ~ • 

n =0 

To regularize ^jv, we define the projection estimator 


|A| <J 


with (ip\,n N ) ■= 


1 


N 


N+l 


5> A(^rJ 


n—0 


for a projection level J € N. We proceed similarly to Gobet et al. [14(. Extending the matrix 


estimator of the transition semigroup, we introduce the matrix estimator Rj = (I?a,a') of the 
action of the operator R from (0 on the wavelet basis with respect to the scalar product (•, •),,: 


1 ^ ^ 

Rxx ■■= 2N E ( X r„ +1 ) IpX' {X Tn ) + V’A' {X Tn+1 ) V-A (Xrj) . 

n—0 

Since the observation times are independent from the diffusion, conditioning on t„, we can verify 
that Rj is an unbiased estimator of the action of the operator R on the basis, that is 


-^(7,5,7 [l^A,A'] — {lp\iRlp\ ')/7- 

The Gram matrix Gj = ((ip\, ipx')n)x,x' G R dunV > xdmi Vj i s determined by ( v,Gjv } = {v,v)^ for 
all v G Vj \ {0}. Hence, Gj is a restriction of the scalar product (-, -) M to finite dimensional space 
Vj. It can be estimated by Gj = (G a,a') with 

11 JV_1 1 
G\,X = —^-1px(X 0 )l/jy(X 0 ) + ^2 i’x(X Tn )'<P\' (X Tn ) + -lp\{X TN )lpy {X TN 2j , 


satisfying 


E<7,6, 7 [Ga,A'] = {ip X,1pX')/j. = (lp\,Gj1py). 

Owing to (v, Gjv) = (v, v ) M > 0 for any v € Vj \ {0}, the matrix Gj is invertible. By construction 
(v, Gjv) is always non-negative and it will be strictly positive whenever the sample is sufficiently 
dispersed over all the interval [0,1]. By ergodicity we can expect this to be a high probability 
event. With a Neumann series argument we can moreover bound the norm of Gj 1 as stated by 
the following lemma, which is proven in Section 17.41 
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Lemma 6. Grant Assumption Q] and [3 On the event 71 = |||Gj — Gj \\ L 2 < hWG-A L * / 

the estimator Gj is invertible and satisfies ||Gj ^ 1 || i2 < 2 ||Gj 1 ||i 2 . Moreover, Pcr, 6, 7 (^ \ 71) < 
N~ 1 2 2J holds uniformly over © s and T. 

Whenever Gf 1 exists, we can consider Gf x Rj. Since Rj is symmetric it immediately follows 
that G~f x Rj is symmetric with respect to the Gj-scalar product. Furthermore, by the Cauchy- 
Schwarz inequality and the inequality between geometric and arithmetic means we obtain for all 
^eVj\{ 0 } 


(■ Rjv,v ) 

- 


< 

Consequently, all eigenvalues of the matrix Gf l Rj are real and smaller than one. It is easy to 
check that 1 is an eigenvalue corresponding to the constant function. We define the estimator 
(KjpjUjp) of the eigenpair (k±,ui) as the eigenpair of the matrix Gf l Rj corresponding to the 
biggest eigenvalue smaller than one. On the exceptional event that Gj is not invertible, we set 
Kj t i = 0 and uyi = 1. Furthermore we choose the estimated eigenfunction ujp normalized in L 2 . 

Using Kj t i and the identification equation K\ = C 7 (—v i), we can estimate v\. The canonical 
estimator for the Laplace transform of 7 is the Laplace transform of the empirical measure of the 
sampling distances A„ = t„ — r„_i ,n = 1,..., N. Hence, we define 

1 N 

£(y)--=jjJ 2 e ~ vAn ’ 

n= 1 


TV—1 


N 

1 


1/2 


n =0 n =1 

1 1 1 JV_1 
— (-z; 2 (A' 0 ) + -v 2 (X TN ) + (X Ttl )) = (Gjv, v). 


Due to the i.i.d. structure of (A„), the classical empirical process theory shows that £ estimates 
£ 7 uniformly in a neighborhood of v\ with the parametric rate TV -1 / 2 . Moreover, £ is strictly 
decreasing and continuous, thus invertible. We define 


vj t i := -£ *-( kj , i )1 


{kj,i>0 }' 


( 10 ) 


With the above definitions and in view of the identification formulas m and we can define 
the plug-in estimators of the diffusion coefficients. In order to ensure integrability of our estimators, 
we need to stabilize against large stochastic errors. Using the prior knowledge that (a, b) e Q s , 
especially ||<r 2 ||oo < D and ||&||z, a < D for some D > 0, we thus define 


aj(x) = 2v Jtl 


fo uj,i(y)fij(y)dy 


A D, 


r , s r , m r r / s vjiuji(x) 

bj(x) = bj(x) l {fbj ^ 2D} for bj(x):= ^ 


2u,j 1 (x) 


( 11 ) 

( 12 ) 


4 Minimax convergence rates 


Let us now state our first main results, generalizing Theorems 2.4 and 2.5 in 
Note that since u^(0) = it'^l) = 0 the function 


14j . respectively. 


[0,1]3 14 


2v i fo ui(y)y(y)dy 
u'i(x)p(x) 


a 2 (x) 
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is defined in {0,1} via continuous extension such that the proposed estimators <r 2 and bj might 
be unstable at the boundary. We restrict the L 2 -loss to an interval [a, b] C [0,1] for 0 < a < b < 1 
and refer to [3, Section 3.3.8] for a discussion of the boundary problem. 


Theorem 7. Grant Assumptions [7] and 0 for some s > 1. Let 0 < a < b < 1. Choosing 
2 J ~ _/V 1 /( 2 ' s + 3 ) ) we have 


sup E(7,fe,7 [||er,7 — 
((T 2 ,6,7)G0 s xr 

sup E ffi6i7 [||6j-6||| 2( [ J])] 
(a 2 ,6,7)60 s xr 


< ^y-2s/(2s+3) 

< jy-2(«-l)/(2s+3). 


The risk of a 2 and b decomposes into the errors for estimating the invariant density /i and the 
eigenpair and (rq,wi) of the infinitesimal generator L of the diffusion. In view of formula ([2]) the 
invariant density inherits Sobolev regularity of degree s from the diffusion coefficients. Together 
with the ergodicity and the spectral gap n can be estimated with the rate — ^||l 2 ] 

N~ 2s + 1 if we choose 2 J ~ jV _1 /( 2s+1 ), c f. Proposition [TlJ Due to C 1 {—v i) = K\ estimating v\ 
reduces to estimate the eigenvalue K\ of the operator R and the inverse of the Laplace transform 
£ 7 in a neighborhood of n\. The latter estimation problem can be solved with standard empirical 
process results yielding the parametric rate IV -1 / 2 for C , see Lemma H8l 

The analysis of the estimation error of the eigenpair of the generalized transition 

operator R is the most challenging ingredient of our proofs. We first restrict the eigenvalue 
problem to the finite dimensional space Vj, that is we find (kj,i, £ R+ x Vj such that 


{v,Ru Jy i) M = kj,i(v,u J}1 )^ for all v £ Vj. 


(13) 


As shown in Theorem [25] the resulting approximation error ||ui — uj,i\\l 2 (h) + |«i — kj,i| is con¬ 
trolled by the spectral gap of the operator R and the smoothness of the eigenfunction (of degree 
s + 1) achieving the order of magnitude 2~ J ( s+1 h In the second step we approximate the finite 
dimensional problem (1131) by a generalized symmetric eigenvalue problem for the random matrices 
Rj and Gj. We use classical a posteriori error bounds to show that the approximation error is con¬ 
trolled by the norm of the so called residual vector r = (Rj — kj,iGj)uj,i, cf. Theorem E6l ||r|| i 2 
can be bounded by the matrix approximation errors ||(-Rj — Rj)uj ^i||z, 2 and ||(Gj — Gj)uj t ±\\ l 2 
that tend to zero by the mixing property of the Markov chain ( X Trl ) n . A delicate point is that 
the a posteriori technique gives an existence statement, but does not bound the error between 
ordered eigenpairs. We overcome this difficulty using the absolute Weyl theorem for generalized 
symmetric eigenvalue problems, see [21] . We conclude that (ki,u\) can be estimated with the rate 

N -(s+1)/(2s+3)_ 

Because the volatility estimator relies on the first derivative of the eigenfunction the statistical 
problem is ill-posed of degree one, deteriorating the rate to A r_s/, ( 2s+3 ). For the drift estimator 
we need the second derivative, adding a degree of ill-posedness. At the same time the regularity 
of b is smaller such that the rate becomes jV - ( s-1 )/( 2s+3 ) = _/V - ( s-1 )/( 2 ( s-1 ) +5 ). Compared to 
Gobet et al. [3} , the same rates can thus be achieved with random sampling times (with unknown 
sampling distribution) than with equidistant low frequent observations. In fact, the convergence 
rates are optimal in the minmax sense: 


Theorem 8. Grant Assumption^ for an arbitrary 7 £ T admitting a bounded Lebesgue density 
at the origin. Grant Assumption 0 for some s > 1. For 0 < a < b < 1 it holds 


inf sup E^lJo- 2 - cr 2 \\ 2 L 2 {[ b]) ] 
a (<T 2 ,6)e0 s 

inf sup E ctA7 [||&-&||| 2([ 6]) ] 
b (a 2 ,6)ee s u u 


> 2s/(2s+3) 

> tv - 2(s-1)/(2s+3) , 


where the infimum is taken over all estimators, i.e. measurable functions, a and b, respectively. 
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The proof of the lower bounds for observations sampled at random times follows the same 
strategy as for low frequency observations in [p|. Constructing alternatives that admit the same 
invariant measure, proving the lower bound is reduced to a testing problem by Assouad’s lemma, 
see Tsybakov [3^, Sect. 2.7.2]. The Kullback-Leibler distance between the distributions of two 
alternatives can then be bounded in terms of the L 2 — distance between the kernels of the corre¬ 
sponding operators R from ©, which is finally accomplished using Hilbert-Schmidt norm estimates 
and the explicit form of the inverse of the generator. 


5 Adaptive estimation 


The optimal choice of the projection level crucially depends on the unknown smoothness s. In 
this section, we construct a completely data driven estimation procedure adapting to the Sobolev 
regularity of a 2 and b. We focus on the volatility estimator, noting that the methodology should 
extend to the drift estimation without additional theoretical problems. We adopt the general 
adaption principle by Lepskii [2f| . 

The aim is to chose the optimal projection level from the set 


Jn := [Jmin, Jmax ] H N with 2 Jrnin ~ log iV, 2 J? 


_ N _ 

(log N ) 2 log log N 


For any J G Jn we define 

A := 

J N 


(14) 


for some appropriate constant A > 0 depending on d , D as well as I, a (but not on s) from the 
Assumptions Q] and [3] The quantity sj is an upper bound for the stochastic error of a 2 , cf. 
Corollary [2H The adaptive estimator is defined by 


with J := min { J € J N : M k>j,kgj n \\&k ~ ^j||i 2 ([Q,fe]) < sr-}. 


Heuristically, J is the smallest projection level for which the stochastic error still dominates the 
bias. 

Our main result for the adaptive estimation shows that the estimator a 2 achieves the optimal 
convergence rate up to an additional log log N factor. 

Theorem 9. Grant Assumptions^ and define To := {7 £ T : E 7 [t -1 / 2 ] < D}. Let Assumption 0 
be fulfilled for some s > 5/2. Let 0 < a < b < 1. Then there exists for every e > 0 some C > 0 
such that, for N sufficiently large, we have 


sup 

(cr, 6,7) G 0 S x T 0 





Q.fe]) 


> c 


log log N 

N 


2 s/ (2s+3) \ 

) <£ . 


The proof of this theorem is postponed to Section 0 It relies on a concentration inequality 
for the Markov chain (X Tn ) n >o, see Proposition 1251 as well as Nickl and Sohl 23j, Section 3]. For 


the latter we need the additional assumption on 7 allowing for a uniform bound on the transition 
density of the time-changed diffusion process. Up to the concentration result, the proof relies on 
the standard arguments for the Lepski method. 


6 Numerical example 

In this section, we present numerical results for the volatility estimation. Throughout the chapter, 
we consider a diffusion process X with linear mean reverting drift 6( x) = 0.2 — OAx, quadratic 
squared volatility function a 2 (x) = 0.4 — (x — 0.5) 2 and two reflecting barriers at 0 and 1. The 
sample paths were generated using Euler-Maruyama scheme with time step size 0.001 and reflection 
after each step. 
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Sample path 



Figure 1: Sample path of the process A' for 0 < t < 4 with marked observations from different 
sampling distributions. 



Oracle projection level 

Adaptive estimator 

Sample Size 
Distribution —— 

4 000 

12 000 

20 000 

4 000 

12 000 

20 000 

Deterministic 

0.0233 

0.0155 

0.0123 

0.0318 

0.0214 

0.0130 

Uniform 

0.0258 

0.0168 

0.0134 

0.0341 

0.0221 

0.0139 

Exponential 

0.0282 

0.0177 

0.0141 

0.0362 

0.0231 

0.0148 

Beta 

0.0296 

0.0211 

0.0179 

0.0432 

0.0255 

0.0178 


Table 1: Root mean integrated squared error for volatility estimation on [0.1,0.9] based on 1000 
Monte Carlo iterations. 


For A = 0.25 we compare the estimation error for four different sampling distributions of quite 
different shapes: the case of equidistant observations with frequency A -1 , the uniform distribution 
on the interval [0, 2A], the symmetric Beta(0.2,0.2) distribution rescaled to the interval [0, 2A] 
and finally, the exponential distribution with intensity A -1 . Note that all considered distributions 
have mean A, Uniform and Beta distribution have the same compact support [0, 2A] and together 
with exponential distribution they allow for arbitrary small sampling distances. Figure |T| depicts 
a fragment of a simulated trajectory of the diffusion together with the observations from different 
sampling schemes. 

To construct the approximation spaces, we used the Fourier orthogonal cosines basis i.e. 

Vj = span{-\/2cos(j7ra:) : 0 < j < J}, 

cf. Remark O We compare an oracle choice of the projection level with the adpative estimator. 
As target interval we choose [0.1, 0.9]. 

In Table |T] we compare the oracle and adaptive root mean integrated squared error (RMISE) 
for volatility estimation on the interval [0.1,0.9], obtained by a Monte Carlo simulation with 1000 
iterations. The oracle projection level J is stable with respect to the sampling distribution and 
surprisingly small, taking values 2 for N = 4 000 and 4 for N = 12 000 and N = 20 000 across all 
distributions, with the exception of Beta with sample size N = 12 000, when it equals 2. For the 
adaptive estimation we chose the constant A in (full) equal to 0.01. 

Relative to ||ct 2 ||l 2 ([o.i,o. 9 ]) ~ 0.31 the error of the oracle decreases from approximately 10% for 
sample size N = 4 000 to 5% for N = 20 000. In particular for large sample sized the error of the 
adaptive procedure is fairly close to the oracle error. The errors are quite stable across sampling 
distributions as the estimator, where the deterministic sampling allows for the smallest error and 
the Beta distribution generates the largest errors. The latter is not surprising because the Beta 
distribution is chosen in a way that yields a strong clustering of the observations. 
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Figure 2: Estimated volatility functions using adapted estimator for 20 independent trajectories 
of the diffusion and four different sampling distributions with sample size N = 20 000. 


For 20 independent paths and sample size N = 20 000 the resulting adaptive volatility estima¬ 
tors are shown in Figure [7J While the estimators behave nicely in the interior of the interval, the 
boundary problem outside the interval [0.1,0.9] is clearly visible. Again we see that the estimation 
for the Beta sampling distribution is the worst. 

In the misspecified case where the randomness of the observation times is ignored, the RMISE 
of the low-frequency estimator designed for equidistant observations with A set to the average 
observation distance is four times larger than the error of our method in our simulations. 


7 Proofs of the upper bounds 

Throughout we take Assumptions [T| and [3] for granted. 


7.1 Spectral properties of the generalized transition operator R 

Recall that u\ is the eigenfunction corresponding to the biggest negative eigenvalue V\ of the 
generator L, normalized in L 2 ([0,1]). By [14], Proposition 6.5] u\ can be chosen to be increasing 
and for any 0 < a < b < 1 there exists a positive constant c at b > 0 such that 


inf inf u\ ( x ) > c a b- 

(a,6)G©s x£[a,b\ 


(15) 


By Lemma 6.1 in L| the family of generators {L a b : 
Q s meaning that there is a constant sq > 0 such that 


(a, b) £ © s } has a uniform spectral gap on 


, inf inf|ui-ui|= inf (H , \v 2 - vi|} > s 0 . (16) 

O,b)e0 s i^i (cr,6)e0 s 

Moreover the eigenvalues Vk satisfy uniformly on © s 

C\k 2 < -v k < C 2 k 2 , (17) 

for constants 0 < C\ < C 2 , while corresponding eigenfunctions ilk belong to the Sobolev space 
H s+1 fulfilling 
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( 18 ) 


IK|| ff »+i < (l v Kl) 1 " 8 " 1 . 

As announced in Lemma (0]) these bounds transfer uniformly to the operator R. 

Proof of Lemma [3} For convenience we define m := min I > 0 and M := max/. By the definition 
of R and the uniform bounds on the eigenvalues Vk of L in (1171) . we have 


> 1. 


poo poo 

Kk = C 7 (-Vk) = / e tVk j(dt) > / e - tc, 2 fc 2 7 (dt) > ae~ MC2k2 for k 

Jo Jo 

The spectral gap of the operator R equals min{l — ki,ki — k 2}- Due to JIB]) , we have 

pOO pOO 

/ (e* 01 - e tV2 )^{dt) = / e tV2 (e t{vi ~ V2) - l) 7 (di) 

Jo Jo 

p 00 

/ e" 4tC2 (e ts ° - 1 ) 7 (di) > ae~ 4MC2 (e ms ° - l). 

io 

Similarly 1 — Kq = / 0 °° (l — e tVl ) r y(dt) > / 0 °° (l — e~ tCl )l(dt) > a[\ — e~ mCl )- 


Hi - k 2 = 

> 


□ 


7.2 Consequences of the mixing property 

First we establish general bounds for the variance of integrals with respect to the empirical measure 
which are due to the mixing behavior of the sequence (A' T(e )fc. The following Lemma is a straight¬ 
forward generalization of Ly. Lemma 6.2]. Since this is the key result to bound the stochastic 
error, we give the proof to keep the paper self-contained. 

Lemma 10. For bounded Hi,H 2 £ L 2 ([0,1]) we have the following two variance estimates: 

N 


Var, 


r 1 


<7,6,7 




N -1 


Var, 


(7,6,7 


- £ Hi (X Tn )H 2 (X Tn+1 ) 


< r'E^^F^Xo)], 

< N- 1 E a ^[HUX 0 )H 2 2 (X Tl )\. 


n—0 


Proof. Denote / (X Tn ) = Hi (X Tn ) — E CT)bj7 [Hi (X Tn )]. Consider m> n and let k = m — n. Since 
process X is stationary and has a uniform spectral gap ||i? fc /ll-L 2 (/q < ||/|| i 2( M )£*(s 0 ) holds for 
every function / that is L 2 (/z)- orthogonal to constants. Arguing analogously as in the proof of 
Lemma 0Jve obtain sup„. er £ 7 (.So) < 1. Hence, by the Cauchy-Schwarz inequality, 

E<t, 6, 7 [/ PCJ / (*rj] = E CT , b , 7 [/ (*rj E^ [/ (. X Tn+k ) \X T „]\ 

= {f,R k f)u<\\f\\WA^)- 

Since ||/||| 2 ( M ) = Var CT:bi7 [Hi (A 0 )] < E CTibj7 [Hf (A 0 )] and 


N 

Var CT , b , 7 

n= 1 


N N 

E E tT)b , 7 [/(X T J/(X Tm )]<||/|| 2 2(M) ]T 4 n - m '(s 0 ) 


n,m=l 


n,m=l 


to prove the first inequality we just have to show that J2nm=i (so) X. This easily follows 

from the formula for the sum of finite geometric series. 

To prove the second inequality, first note that 


N -1 


JV-1 


Var. 


cr,6,7 


- £ Hi(X Tn )H 2 (X Tn+1 )} <^E CT , b , 7 [ £ Hi{X Tn )H 2 {X Tn+ 1 )Hi{X Tm )H 2 {X Trn+1 ) 


n—0 


n,m=0 
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N -1 


—E ff , b , 7 [ £ H, (X Tn )H 2 (X Tn+1 )H X {X Tm )H 2 (X Tm+1 ) 


n,m=0 


<- Hi,RH 2 )1. 


Since the sum of diagonal terms equals IV i E 0 . jbj7 
upper bound. The sum of the other terms equals 


H 2 (X 0 )H 2 (X T 


, it does not exceed the claimed 


N -1 

— ^ <^2 • • (RH 2 ) - {H^RHt)^ 

n,m =0 
n^m 


-±( Hl ,RH 2 ) 


2 


[h1(Xo)H 2(X t1 )\ 


Using the spectral gap of the operator R together with the Cauchy-Schwarz inequality, we obtain 
that 


R \n-m\-l( Hi . _ ( HuRH2 )J 


< 




\H, ■ (RH^^-^iso). 


Consequently, using again Cauchy-Schwarz and the formula for the sum of finite geometric series, 
we can bound the considered variance by 


^2 E ll#2 ■ {RHi)\\ l , w ||J?i •(i?JT 2 )|| L2W 4"-™ | - 1 ( So ) 

n,m—0 

n^m 

< ±\\H 2 • {RHJWvmWH! ■ ( RH 2 )\\l^ 

< lE ff)6;7 [Hi{X 0 )H*(X Tl )] 1 / 2 E', tb „ [H 2 1 (X 0 )Hl(X Tl )] 1/2 

= ^E CT , b , 7 [H*{X 0 )H*(X n )\ . □ 

The first consequence of the previous result is the following bound for the risk of the estimator 
of the invariant measure. 


Proposition 11. Under Assumption^ it holds 


E 


<7,6,7 


IIm-R/IIL* 


< n~ 2Js + N~ 1 2 J . 


(19) 


Furthermore if we choose 2 J ~ jyi/(2s+3) the event % = {Vx £ [0,1] inf p/2 < fij{x) < 2sup/r} 
satisfies P CTibj7 \ 7o^ < iV _2s + 3 . 

Proof. The explicit formula m for /i shows that ||/i||#s is uniformly bounded over Q s . Jackson’s 
inequality yields 

W ~ <2~ 2Js . 

Using Lemma 1101 we obtain 

P(T,6,7 [\\njpL ACIIl 2 ] = ^ P<x,b,7 [(f/’A: P Fn) ] = y ) Varcr, b)7 [(t^A^MTv)] 

| A|< J | A|< J 

< N - 1 ]T E CT , b , 7 [i>l{X o)] <2 J N~ 1 

|A| <J 

and in follows by the triangle inequality. Furthermore, by Jackson’s inequality, 

sup 7 Tjfj,(x) < ||/i||oo + ||(/- TTj)fi\\oo < \\u\\m + ||(-f- Kj)n\\ m < 1 + 2 _J(s_1) 

xe[o,i] 

inf 7 ijp(x) > inf p(x) - ||(7 - 7r t7 )/Li|| 00 > 1 - 
xe[o,i] xg[o,i] 
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Hence, for J large enough, njp. is bounded by | inf p from below and | sup p from above. Conse¬ 
quently, 'p.j(x) lies in [| inf p,,2sup/z] if \\pj — 'KjpWoo is small enough. For a given constant C > 0, 
Bernstein’s inequality shows 


^cr.b^i^PJ ^JMlIoo > < C ”lE (Tj b ] ^, [|| 7TJ/Z MjIIoo] ^5 ®<r,fe,7 [ll 7r ./l t ADIIj/ 1 ] 


< 2 2J E, 


ct,6,7 




< TV 2s + 3 . 


□ 


7.3 Analysis of the projection error 

Denote by (Kjy, ztyj), * = 0,1,2,..., dimVj — 1, the eigenpairs of the operator ttjRttj ordered 
decreasingly with respect to the eigenvalues. Note that (Kyi,uy») are solutions of the eigenvalue 
problem for the operator R restricted to the finite approximation spaces Vj on L 2 (p): 

{Ruj t i, v)^ = kj^uj^v)^, for every v S Vj. (20) 

Take uj y normalized in the L 2 norm. Since tTjRtTj is a positive definite self-adjoint operator on 
L 2 (p) with ||7Tji?7rj|| i 2( M ) < 1 we have 0 < kj y < i. 

Proposition 12. For sufficiently large J it holds uniformly on © s 

|kj,i - «i| + ||uj,i - < 2~ Js . 

Proof. It suffices to show that \kj,i — ki \ + ||ztyi — zii|| i2 < 2 _J ( S+1 1. Indeed, by Jackson’s and 
Bernstein’s inequalities 

\\uj,l - < ||«J,1 - TTjUl\\ Hl + || (T - 7Tj) Wl|| ffl < 2 j 11 Ziyi - 7TjMl|| i2 + || (J - TTj) Ml || ff i 

< 2 J ||MJ I 1-Mi|| i2 + 2 J \\(I — 7Tj)zii|| L 2 + \\(I -irj) Ul \\ H1 

< ||ziyi — Zii|| i2 + 2 Js 

where we used the upper bound (fTSl) . 

Recall that R is a compact self-adjoint positive-definite operator on L 2 (p). Furthermore 

||(/-7r(;)ui|| i2W < \\(I-rf){I- ttj) U i \\ l2 < ||(J-7rj)ui|| i2 

< 2- J ( s+1 )||zzi|| ffs+ i < 2" J ( S+1 ). 

Consequently, since by Lemma U operator R has a uniform spectral gap inequality 


k i — «2 


||(/-^) U i|| i2(/i) <^- 

holds for J large enough. It follows that we can use Theorem 1551 obtaining 


|«J,1 - «l| + 


wyi 

U 1 

H M vilU 2 (/x) 

ll u l|U 2 0) 


L 2 (m) 


< 2 —J(»+i) 


The claim follows since ||ztyi 
II ' I|z, 2 and || • || L 2 (ai) . 


w ilU 2 ^ 


WJ,1 

U! 

!I“j,iIL 2 ( M ) 

ll M illz 2 ( M ) 


l 2 (m) 


by the equivalence of norms 

□ 


Corollary 13. Projected operators tijR-k^ have a uniform spectral gap, i.e. there exists si > 0 
such that 

min(|Kyi|, |kj, 2 - Kyi|} > si 

for every J large enough. 

Proof. Follows from the proof of Theorem [55] □ 
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7.4 Analysis of the stochastic error 

Define the operator Rj : Vj —> Vj as the restriction of the operator tTjRtTj to the finite dimensional 
Hilbert space Vj. Recall that the operator Gj was defined by the Gram matrix of the inner product 
i.e. for v £ Vj we have ( v,Gjv) = (v,v) M . Note that by (EDI) 


RjUjj = KjjGjUjj, 


( 21 ) 


hence (kjj,ujj) are solutions of generalized symmetric eigenvalue problem for Rj,Gj. When 
matrix Gj is invertible the corresponding generalized eigenvalue problem for Gj,Rj , namely 

Rjujj = Kj'iGjUjj ( 22 ) 

has dimVj solutions that we denote by {kjj, ujj), i = 0,1, ...,dimVj — 1. Recall that the eigen¬ 
functions Uj,i are normalized in L 2 [ 0,1]. 

In this subsection we want to bound the expected error between («j,i, Mj,i) and («_7,i, Gj,i). 
From the general theory of a posteriori error bound techniques for generalized symmetric eigenvalue 
problems (see Section IA.2I) we know that the error between the eigenpairs can be controlled by 
the norm of the residual vectors: 


r = (Rj — Rj)uj t i + kj,i(Gj — Gj)uj t i or r* = (Rj — Rj)uj t i + kj,i(Gj — Gj)ujj. 


Since the eigenpair (kjj,ujj) of the problem (EDI) is random and depends on operators Rj and 
Gj it is easier to analyze the norm of the vector r rather than r* (cf. Lemmas [TD] and [TD] where v 
is a deterministic function). Consequently in the following we refer to r as the residual vector. In 
the notation of Section [A. 21 we treat the deterministic problem (1211) as a perturbed approximation 
of the data dependent problem m- 

Lemma 14. For any v € Vj we have, uniformly on O s x T, 


E, 


o’,6,7 


II (Gj — Gj)v\\ 2 L 2 


< iV- 1 2 J ||w|| 


2 

L 2 ' 


Proof. Given Lemma [TUI the proof is a straight forward estimate analogously to 
4.8], 


14, 


Lemma 

□ 


Now, we are ready to prove Lemma [6] 

Proof of Lemma [71 A standard Neumann series argument shows that Gj is invertible on 7i with 
||Gj _1 || i2 < 2||Gj 1 ||i2. Since the invariant density fj has a positive lower bound uniformly on 
© s , for any v € Vj we have 


(v,Gjv) = (v,v)^ = \\v\\ 2 L 2 {li) > \\v\\ 2 L 2. 

Hence the smallest eigenvalue of the operator Gj is uniformly separated from zero. This im¬ 
plies that Gf 1 is uniformly bounded in the operator norm. The classical Hilbert-Schmidt norm 
inequality yields 

||Gj-Gj |£ 2 < Y, |(Gj — Gj)ip\\\ 2 L2 . 

\\\<j 

Consequently, by Lemma HH E^^^ [||Gj — Gj||^ 2 ] < N~ 1 2 2J and Pcr,b, 7 (D\71) < N~ 1 2 2J follows 
from Chebyshev’s inequality. □ 


Lemma 15. For any v € Vj we have, uniformly on Q s x T, 


E, 


cr,6,7 


URj-Rj)v\\'h 


< N-^WvW 


2 

L 2 ' 
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Proof. By Lemma |TU] we obtain 


E 


<7,6,7 


\\(Rj-Rj)v\\h 


= E Var 

|A|<J 


( 7 , 6,7 


■ 1 
.N 


N -1 

E V'A (-X"r„ ) ^ (T r „ ) 


n—0 


< e -^ _ 1 ie < 7 , 6,7 [^i{x T1 y(x 0 )] 

|A| <j 


<^- 1 || E ^iWoo^yiXo)] 

|A| <J 

<N-^y\\l^y 


□ 


Corollary 16. ITe have, uniformly on 0 S x T, the following bound on the norm of the residual 
vector r = (Rj - Rj)u Jt i + Kyi (Gy - Gj)u Jtl 


E 


cr,6,7 



<N~ l 2 J . 


Proof. Note that from Proposition [TO] we know that, for J big enough, the eigenvalue Kyi is 
uniformly bounded. Consequently 


E 


<7,6,7 


IL 2 


< E. 


<7,6,7 


\\{Rj - Rj)u{\\ 2 L 2 


E 


<7,6,7 


IKGj-GjKiH, 


< N~ l 2 J 


by Lemmas [IT1 and [TO □ 

Proposition 17. On the event Pi the eigenpair (kj,i,uj,i) is the biggest nontrivial eigenpair of 
the matrix G~f x Rj. Furthermore there exists a set 71 C 71 such that 

p a, bn (n\T 2 )<N- 1 2 3J 


and 

E<7,b 5 7 

holds uniformly on © s . 


l T 2 


kj, 1 - «j,i| 2 + ||wj,i - uj,i|| 2 2 


< iV _1 2 J 


Proof. By Theorem [00] there exists some 0 < io < dimVj — 1 such that the eigenpair (kj.; 0 , uj,i 0 ) 
of the problem m satisfies 


\kj,i- kj, 

\\ujp - Uj }io \\ L 2 < 


*,1 < HOT 1 


J \\L 2 

2y/2 


8 (kj,») 


IM 


II 2 j 

1/2,, —_ 1 ,|3/2 

L 2 II^J IlL 2 


li 2 ) 


where (5 (Kj,i 0 ) — min .j^ 0 {|/c jj — kj,i|} is the isolation distance of the eigenvalues kjj 0 and Kyi. 
Let si be the uniform spectral gap of operators Rj (see Corollary fl3l) . Define T 2 as the subset of 
71 for which io = 1 and 6 (kj,i) > ^si. Since ||Gj 1 || i2 and ||Gj || L 2 are uniformly bounded on the 
event 71 and E CT [Ill'll L 2 ] < N 1 2 J the desired error bound holds when we restrict to the event 

r 2 . 

To finish the proof we must show that P CT ,fe,7 (£2 \ 71) < N~ 1 2 3J . Denote 

71 = 71 © (*o = 1} © {<5(«yi) > si/2} . 

72,1 1~2,2 

First, using the absolute Weyl theorem (Theorem 1271) we observe that for any 0 < j < dimVj — 1 


E 


<7,6,7 


iri • I Kj,j ~ kjj 


< E a h, 


171 • g; 


J II L 2 


| [Rj - Rj) - Kjj (Gj — Gj) | 


L 2 


16 






















5, E, 


<7,6, 7 


lTi ' ||-^J -^j|| r2 -^-7o * || 


< A/’ _i 2 


— 1q2J 


Gj 


2 

L 2 


by the classical Hilbert-Schmidt norm inequality. Consequently, using the uniform lower bound 
on the spectral gap of Rj, we obtain 


P<r,b, 7 (71 \ 72, 1 ) < 

■^<7,6,7 


Ky 2 

- «y i f 

" 



< 

■^<7,6,7 

irATa.i 

o' 

- «yi 

2' 



< 

■^( 7 , 6,7 

i-TATa,! 

3* 

i-H 

O ' 

— Kyj 0 

2 

H - ^<7,6,7 

r, 1- pi 

J-TATS,! ' |Ky* 0 ~ Kyi| 

< 

N~ 1 2 2J . 







Consider now the event 72,2- Since 

8 = min |kjj - Ky i| > min {\kjj - Kyi \ - \kjj - kjj |} 

jyi 

> Si max {|kjj - Kjj |} , 

we have 


Ea,b, 7 (71 \ 72,2 ) < Po-,6, 7 (7i n { max {[kjj - kjj |} > si/2}) 

< ^ P ff ,6, 7 (Ti H {|«Jj - Kj,j\ > Si/2}) 

l<j<dimVj —1 

fS E<t ,&, 7 

l<j<dimVj — 1 

7.5 Proof of Theorem [7] 

From now on we chose 2 J ~ 7yi/( 2s + 3 ). Recall that the biggest negative eigenvalue of the infinites¬ 
imal generator L is denoted by v\ which is estimated by uy 1 from COD. 

Lemma 18. Choose 2 J ~ N 1 /( 2s + 3 ). TTiere is an event T 3 C T 2 satisfying \ 7i) < 

A r_ 2s/(2s+3) un if orm ly on © s x F and 

sup E<7,&, 7 [1 t 3 K - £yi| 2 ] < N~^. 

(ct, 67 )G 0 s xr 

In particular we can assume that is uniformly bounded on T 3 . 

Proof. For convenience we denote m := min7, M := max/. On T 2 we have Kyi > 0 and thus 
Ky 1 = £(~vj, i). 

.Step 1: Let us start with a consistency result for vj t \. Since C is non-increasing and continuous, 
we have for any fixed e € (0, Ci) with C 1 from (1171) 

P 7 (|«yi - Vi\ <e) >P 7 (£(-v 1 + e) < Kyi < £(-vi - e)). 

Using 

8 := ame^ Vl ~ e ' ,M < inf inf \C'(y)\, (23) 

7 er |y+-ui|<£ 

we have |£ 7 (— v{) — Cj(—v 1 ± e)| > fc uniformly in 7 S T and 

P<r,6, 7 (|«yi - v\\ > e) <P C t, 6 i7 (ki - Kyi > Ki - £(-vi 4- e)) + P<T,6, 7 (Kyi - «i > £(-id - e) - ki) 


lTi • kjj - Kjy 


< 


- lr , 3 J 


□ 
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< Y P<t,6, 7 (I«j,i - «i| + |£(-i>i + y) - £ 7 (-t>i + y)\ > 6e) 

y&{-s,+e} 

5 s 5 s 

<2P (T , 6 (|kj,i - Ki| >-^) + Y Pyil^i-v^+y) - Cji-'Vx+y)] > - 1 ). 

ye{-e,+e} 


By Propositions [*H?land[T71 and Markov’s inequality the first probability is of the order N 2s /( 2s + 3 ) 
if 2 J ~ TV 1 /( 2 s+ 3 ) _ p or ^he estimation error of £ Markov’s inequality yields for any y > 0 


P 7 (|£(y) - £j{y)\ > Se/2) <2(fe)“ 2 E 7 [| C{y) - 

2 


Z N5^ V ^ ^ 6 


-2/Ai 


£ 7 (z/) I 2 ] 

\ 2Cry(2y) 

’ - NS 2 e 2 ■ 


Therefore, 

P<t,6, 7 (|uj,i - wi| > e) < _/v _2s /( 2s+3 ). (24) 

Step 2: To determine the rate of uj,i, we use a Taylor expansion which yields for some inter¬ 
mediate point £ between —v\ and —i i/p 


Kyi = £(~vj, i) = £(-fi) + (ui - v Jt i)£'(£). 


Since on the other hand we have Kyi = £ 7 (—iq) + Kyi — Ki, we conclude 

£ 7 (-fi) - C{~vi) + Kyi - Ki 


vi - Vji = 


£'(0 


provided the denominator can be uniformly bounded with high probability. By (1241) the event 
73,1 := {|vyi — t’i | < e} has at least the probability 1 — cN~ 2s ^ 2s+ ^ for some c > 0. On 75,1 we 
have 

|£'(0I > inf £ 7 (y)- sup \£'{y) — £ 7 (y)|- 

\y-\-Vl\<£ |l/ + t'l|<£ 

With 5 from (PZ5H) we conclude that |£'(£)| > 5/2 on the event 75,2 '•= {sup ?/g [_ Vl _ ej _ Wl+£ ] | £{y) — 

C/{y)\ 2 < 5/2}. Note that in 75 2 we take the supremum of the empirical processes related to 
(A„)n=l... ,,jv acting on the function set T := {[0, oo) 5 x ^ xe yx : y G [|iq| — e, |iq| + e]}. 
Since T is the multiplication of the identity map with the transition class {e~ yx : y > 0}), T 
is a Vapnik-Cervonenkis class and admits the constant envelope function (|iq| — e)~ 1 e~ 1 . The 
empirical process theory (e.g., van der Vaart and Wellner (33||, Thm. 2.14.1) yields 


E-, 


sup 

L se[-»i-£ ,-»i+e] 


\C\y)-C'{y)\ 


< 


l. 


N(\vi\ - e) 2 


and by Markov’s inequality P 7 (fl \ 75,2) < 1-/N. With 75 := 75,l D 75,2 H 75 we finally obtain 


Ht-cr, 6, 7 [I73 I 1 ’! 


V J, ll 1 S 


< 2E, 


<7,6, 7 


ira 


|£ 7 (-i>i) - £(-r>i )| 2 + |ki - Ki 


12 -, 


|£'(C)| 2 

< N- 1 + E CT , b , 7 [lr 3 |Kyi - Ki| 2 ] < jV-W(2s+3). 


□ 


Corollary 19. Choosing 2 J ~ jV 1 /! 2 ® 1-3 ), there exist an event 75 = 75 O 75 of high probability, 
i.e. Pcr,b, 7 (f2 \ 75) < _/v _2s /( 2s+3 ), such that the estimators fij and vj < 1 are uniformly bounded on 
75- Furthermore, for N big enough, we have uniformly on O s and T 


E, 




■0 


lr 4 • I \Vl - Uyi +||ui-Uyi 


I 2 

I H 1 


< j\T —2s /( 2s +3) 


Proof. Note that 75 is a subset of the events from Proposition ITT1 Lemma [TS] and the event that 
fij is uniformly bounded from below and above (see Proposition llll) . Then 75 is a high probability 
event and by Propositions fl2l and [171 the choice 2 J ~ N 1 /( 2s + 3 ') yields the claimed bound of the 
expectation. □ 
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Before we present the proof of Theorem |7] we need to another representation of the volatility 
estimator which allows us to bound the derivative of the estimated eigenfunction. 

Lemma 20. Set 0 < a < b < 1. There exists a high probability event 71 C 71, P<t, 6,7 (H \ 71) < 

jy-2fi/(2s+3) suc /j that 


l-Ts • Vj{%) = 1 Ts 


2 vj,i ff uj a (y)j2j(y)dy 


AD 


(u' JA (x) V < b )/zj(x) 

for a deterministic constant c’ ab > 0 satisfying c' ab < c aib < inf xe r ai b] u'^a;). 
Proof. Recall that 


r(®) = 


Zvja fo ujAy)fij(y)dy 


AD = 


2vj,i f 0 x u.n{y)fij(y)dy 




2fj,i ffuj,i(y)nj(y)dy \ 
p.j(x)D > 


Let m = | inf /r(ir) and M = 2 sup/ij. By Proposition [TT] to < /aj(a:) < M for all a; e [0,1] on the 
event 7o- This event is especially contained in 

T 5 '-TaH ^4\\v Jyl J ujp(y)fij{y)dy - Vi j u 1 (y)y(y)dy\\ oo < d 2 c a , b mj , 

where 71 is the high probability event from Corollary [121 On 71 it holds 


2 ^J,i fa Uj,i(y)fij(y)dy 2v ± fo ui(y)y(y)dy - ff uj,i(y)yj(y)dy - Vi f 0 x u 1 (y)y(y)dy\ 

Dyj(x) ~ Dfij(x) 

a 2 (x)u[(x)y{x) - 2[uj,i ff uj,i(y)fij(y)dy - Vi ff u 1 (y)y{y)dy\ 

Dj2j( x) 

d 2 Cg tb m __ , 

2 MD C “’ b ' 


Furthermore, by Corollary [T21 using Markov and triangle inequalities, it is easy to check that 
P CTi b i 7 (S2 \ 71) < N~ 2s + 3 , cf. estimate (1^51) below. □ 


Proof for the volatility estimator. Set 0 < a < b < 1. Note first that since IV,b , 7 (f2 \ 71) < iV 2s + 3 
and cr, (t are bounded we just have to verify that E CTj b ] 7 [l- 7- 5 • ||er 2 — tx 2 ||| 2 ([ a b n ] < N~ 2s + 3 . Denote 

u'j^{x) = Sj i(a;) V cl b and d 2 (a;) = 2vj ’ 1 jp u Jnbu)nj{.y) d y . gj nce f or x g 5 ] the functions and 


u j,i( x )^ J ( x ) 

y are uniformly separated from zero, we have that on 71 


k 2 ^’) - o5 (a;) | < 


2 ^i f 0 X U!(y)y(y)dy 2v J}1 ff u J , 1 (y)y J (y)dy 


u[( x)y{x) u' J1 (x)yj(x) 

2 (^1 /; My)p(y)dy - £j,i f 0 x uj,i{y)yj{y)dy ) gj(x) «(x)/a(a;) - Uj,i(a;)ffj(s)) 


< 


u i( x )l J '( x ) 

nX r>X 

vi / ui(y)y{y)dy -vjy / u JA (y)yj(y)dy 

Jo Jo 


+ W 2 j{x) I 


ul(a;)/a(x) 

u i(x)y(x) - u' J1 (x)yj(x) 


u[(x) 


=: Ax{x) + -A 2 (a;). 


Observe that since /ij is uniformly bounded on the event 71 and since the eigenfunction ui is 
normalized the Cauchy-Schwarz inequality grants that f Q ujy(y)yj(y)dy is uniformly bounded. 
Hence, 

nx nx nx 

A\{x) = |«i( / u 1 (y)y{y)dy- u J , 1 (y)y J (y)dy) + u J y{y)y J {y)dy{v 1 -vjy)\ 

Jo Jo Jo 
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< 


< 


nx fX 

/ «1 {y)n{y)dy- l uj A (y)'p,j(y)dy\+\v 1 ~v J}1 \ 

Jo Jo 

nX nX 

/ Ml (y)iii(y) - fij,i(y))dy\ + | / (m{y) - uj,i(y))/Ij( 2 /)dy| + K - Si,.. 
Jo Jo 


< ||mi||l 2 IIM - /Xj|| L 2 + ||lti - MJ.iIIlsH/XjH^ + 111! - Si,7 

= II/-* - Ml- + Iki _ mj,i|| L 2 + \ Vl - Sj.il. 

Furthermore, since d 2 (x) is uniformly bounded on 7s 

M{x) < \n(x) - fij{x )| + K(x) - 147 , 1 ( 0 :)| 

Kwl 


( 25 ) 


< |//(x) - /ijk)l + Kk) - Mj,i(a:)| 

< | n{x) - ilj{x)\ + K(x) - M 7 ,i(x)|. 


(26) 


Consequently, 


E 


'a, 6,7 


-1 112 -*^2 11 ^ 

lr s • k - °7 


L 2 


< E CT) 6, 7 [I7-5 ’ (Iklllla + II^Hl 2 )] 

< E ctA7 l Ts • (||/4 - M\l- + Iki _ Mj.illffi + ki - VJ, i| 2 ) 

< jy-2s/(2s+3) 


□ 


Proof for the drift estimator. To obtain the upper bound on the drift term first note that using 
Bernstein’s inequality we can extend the proofs of Propositions [T7] and [17] to obtain 


E 


'a, 6,7 


171 • Iki - u| 


H 2 


2(s —1) 

< tv 2s + 3 . 


(27) 


Let T 6 = T 5 n {inf xe[aib ] u'j^(x) > c Qjfe / 2 } D {||mj,i ||//2 < 2 || 10 , 1 1| 772 }. By Lemma HUJand ([27]) we 

2 ( 3 - 1 ) ^ 

obtain that P (Tj b i7 (fl \ Tq) < N 2s +! . Since both b and b are bounded in L , we can restrict 
the error analysis to the high probability event 76- Recall the definition of b from (fl^l) . Since 
INIL 2 ([a,fc]) < D we have ||&j - b\\ L 2 ^ a b]) < ||bj - b\\ L 2 ^ a b] y Consequently, it remains to show 


E 


'O’,6,7 


iTe ’ II bj ~ fr||i 2 ([a,&]) 


2(s — l) 

< N 2s + 3 . 


On Tq, for x £ [a, b] we have 
I bj(x) - b(x) | < 


vj,iuj,i(x) _ a j( x )^j,i( x ) _ viui(x) a 2 (x)u"(x) 


u'{x) 


2uj Ax) 


u[{x) 


2 u[(x) 


<kik)l 1 Vj t iuj t i(x) - Viui{x) + a ( X K i"(x) - 


\( x ) 




\i{ x ) 


+ 


\ u i( x ) - Mj.ikOl- 


The uniform lower bound on |u'i| yields 

II bj — &||l2([o,&]) ^Ikj.iMj.i — WiMi|k 2 Q a , b ]) + IkjMj.i — <x 2 u[ lli 2 ([o,fe]) 

+ IIMll 2 ([a,6])lkj,l _ M illi“([o,b]) 

=:Bi + 7?2 + B$. 

We will estimate these three terms separately. Corollary [TUI and the normalization of uj t 1 yield 

E<r,b,7[l7i-Bl] < E«r,6, 7 [lr(|«7,l ~ Ml | 2 1| Uj,l ||| 2 + |mi| 2 ||U7,i - Wl||| 2 )] < N~ 2s ^ 2s+3 \ 
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The second term can be decomposed into 


B 2 < 2 || 5 j - vYoo\W\\l> + 211^11^11^ - u'l\\h- 

From (ESI) and ESD we can easily verify that 

IIOj - CT 2 ||oo < |UJ,1 - Vi| + ||mj, 1 - Wl||//2 + Wfij - n\\ H i. 

Since a 2 is bounded by construction, we conclude 

E CT , b , 7 [lr 6 B 2 ] < E aAl [l T6 (\vj A - Vl \ 2 + || u JA - wi|||, 2 + +||mj - v\\m)] < N~ 2 ^^ 2s+3 \ 

For the last term it holds 

E a ^[l Te B 3 ] < E a ^[l T Ml H[aM) \\uj,i - «i||k] < iV-2(-i)/(2 S +3) 

since ||&j||z, 2 [( a) &)] is uniformly bounded on Te ■ □ 


8 Proof of the lower bounds 

First note that estimating the sampling distribution 7 has no impact on the convergence rates, 
because the Laplace transform can be estimated with the parametric rate. Therefore, it suffices 
to use the same distribution 7 £ T for all alternatives. Throughout this section we thus fix some 
7 £ T which admits a bounded Lebesgue density on [0,T] for some T > 0. 

Without loss of generality we can suppose that (1,0) £ 0 S . To construct the alternatives, 
let i/> be a compactly supported wavelet in H s with one vanishing moment. We set ipjk(x) = 
2B 2 tjj(2 : ’x — k) and denote by Kj C Z a maximal set of indices k such that supp^-fc) C [a, b] 
and supp (ipjk) D supp^*/) = 0 holds for all k,k' £ Kj, k ^ k'. For a constant S > 0 and all 
£ = (£&) £ (— 1 , 1 }^ we define 

S £ (x) = S £ (j,x) = (2 + S ^2 . 

keKj 


Choosing S ~ 2 J’( a+1 / 2 ) yields ( \/2S e , S' e ) £ 0 S . The corresponding diffusions X^ are defined 
by their generators 


L ef(x) = S £ (x)f"(x) + S' E (x)f'(x), 

dom(L £ ) = dom(L). 

Note that for any e the invariant measure of X^ e > is given by Lebesgue measure on [0,1]. For e, e' 
with ||e — e' 11^2 = 2 we have 


S E >{x) - S s (x) = ±25i/jjk(x)S e '(x)S e (x). 

Since S e , S e > converge uniformly to 1/2 as j —> 00 , the L 2 -distances of the volatility functions and 
the drift functions of the alternatives e and s' are bounded by 


\\2S e ,-2S E \\ L ,>6, US’', — S' e \\ L 2 > 2^6. 

Therefore, Assouad’s lemma and S ~ 2 _:, ( s + 1 / 2 ) yield for all estimators a 2 and b 

> 2 j 5 = 2~ 2sj , 

> 2 3j <5 = 2 " 2(s+1)j ', 


sup 

(cr,b)eQ s 


W 2 - a2 \\L 2 ([a,b]) 


SUp E CTjbj7 ||6 ^llz,2([a,6]) 

(<r,b)ee s L 


(28) 
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provided the Kullback-Leibler divergence between the distributions of (x£^) n =o,...,Jv and (x£( ■*)„=o,...,jv 
remains uniformly bounded for all alternatives e,e' with ||e — e , ||^2 = 2 . 

To bound the Kullback-Leibler divergence, we have to take into account the random observation 
times. Denote the transition density of (X t ) t > o by p t (x,y)dy = ¥ a ^{Xt = dy\X 0 = x) for 
x, y € [0,1], t > 0. By the independence of the observation time r and the process X we have 

pOO pOO pi 

Rf( x ) = E<t, 6,7 [f{X T )\Xo =x\= P t f(x)'y(dt) = / Pt(x,y)f{y)dy'y(dt). 

Jo Jo Jo 

For one dimensional diffusions with bounded drift and differentiable volatility, which is uniformly 
separated from zero, we know that 


Pt(x,y) < c 0 (l + 


with co > 0 depending only on the bounds for the drift and volatility (see Qian and Zheng [24, 
Thm. 1]). The assumption E[t -1 / 2 ] < oo thus ensures that 


pOO 

r, i x ,y)= Pt(x,yh(dt) 
Jo 


is a well defined kernel of operator R. We obtain the following generalization of Proposition 6.4 in 

0: 

Lemma 21. Assume E 7 [r -1 / 2 ] < oo. If ( a n ,b n ) G B s , n > 0, such that 
lim ||cr„ — cr 0 1 | = 0 and lirn ||6„ - b 0 \\ = 0 , 

n—too n—too 

then the corresponding kernels r <Jl \x,y)dy = ¥ anj b n (X T £ dy\X q = x ) satisfy 

lim ||r< n > -r (0) || = 0 . 

n—> oo 11 1100 

Note that the bounded Lebesgue density 7 near the origin specially ensures that E 7 [t -1 / 2 ] < 00 . 
Proof. Due to the bound ||p^(-, ^Hoo < 1 + f -1 / 2 , dominated convergence yields 


\ r ( n ) _ r (o)| 


= sup 


X,y e[o,i] 

nOO 


poo 

J 0 ( p ‘” } ^ ^ ~ p * 0) 


< 


f 


Pt -Pt 


( 0)1 


,7 (dt). 


zero. 


□ 


By 0 Prop. 6.4] this tends to 

Exactly as in [l3. Sect. 5.2], this lemma allows us to bound the Kullback-Leibler divergence 
by N\\r e , - r e || 2 2([0i ^ for kernels rv and r e of R e < and R e , respectively, for any e, e 1 with ||e — 

e'11^2 = 2. Note that ||ry — r e || L2 (r 0 ^ 2 ) is the Hilbert-Schmidt norm distance ||i? — R e ||//s = 
||(i? e - R e ')\ v \\HS where 

{/ S L 2 ([0,1])|^ / = 0 }. 

We will bound the Hilbert-Schmidt norm by the difference of the inverses of the generators, which 
are, in contrast to the generators itself, bounded operators. Recall that R = C(—L) for the 
Laplace transform C(z) = / 0 °° e~ tz ^{dt), z > 0. By the functional calculus for operators the 
function f(z) = C^—z^ 1 ) maps (L^y)^ 1 to R e \v- Furthermore, / is uniformly Lipschitz on 
(-oo,0): 
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Lemma 22. Suppose that 7 G F admits a bounded Lebesgue density on [0,T] for some T > 0. 
Then we have 


c := sup 

z<0 


1 r°° 

- / te^idt) 
z 2 Jo 


< oo. 


Proof. We decompose 


sup 

z<0 


1 r°° 1 r T 1 r 00 

-=• / te tlz 'i(dt) < sup —^ + sup / te t/z 'y(dt) 

z Jo z<o z 2 J 0 z<0 z 2 J T 


=: 5i + S 2 . 


Due to the bounded Lebesgue density on [0, T], we estimate the first term by substituting s = t/z 


nT rO rO 

>S'i< su P ~ _2 / te t / z dt = sup / se s ds = / se s ds 
z< 0 Jo z<oJt/z J-oo 


< 00 . 


For the second term note that the function g a (x) = x 2 e ax takes maximum at x = 2/a and 
g (2/a) = 4a - 2 e -2 . Consequently, 


S 2 < sup f tg t {\z\ *) 7 {dt) = f T')(dt) 
z< 0 Jt Jt t e 


— Te 2 < °°' 


□ 


We conclude 


li 2 ([0.1] 2 ) 


= (i? e - R £ ) v \\ H a < c\\(L e \ v ) - (L e >\ v ) 


\-i I 


I HS 


< = 2 _ J ’( 2s + 3 )/ 2 


by the estimate for the difference of inverses of the generators that was established in [14], Sect. 
5.3]. In order to bound ZV||ry — r e ||| 2 qo i] 2 ), we thus choose j such that 2 J ~ _/V 1 /( 2s + 3 ). I n view 
of iPZSll we have proven Theorem [HJ □ 


9 Proof for the adaptive estimator 


In order to show that Lepski’s method works, we need the following concentration result. It 
slightly generalizes the corresponding concentration inequalities by Nickl and Sohl [23, Theorems 
10 and 11 ] for a low-frequently observed reflected diffusion to random sampling times. 

Proposition 23. Grant Assumptions [Tj and with s > 5/2 and 7 £ T, E 7 [t 1 / 2 ] < D. There 
is a constant c > 0 depending only on d, D , I and a, such that, for any n > 0, N G N and any 
f G L 2 (R) fl L°°(R), g G L 2 (R 2 ) fl L°°‘ 


N 

'*, 6,7 (| E (/(^t b ) -E*, 6 , 7 [/(^o)])| > k) <ex P ( 


n—0 


<e*p( -cmin{^ 


2 ' 
L 2 


(logiV)||/||c 


;}) 


and 


N-l 


y, b, 7^1 ^ ^ ^<7,6,7 [di^-0 j X Tl )]) 


n —0 


< ( 


■( 


exp I — c mm 


r k, 


:))• 


(log^v)lbllc 

Proof. The conditions of the Markov chain concentration result by Adamczak [l], Theorem 6 ] have 
to be verified. This can be done along the lines of the proofs in [23[ using Lemma 0] and noting that 
the transition density of the time-changed chain (X Tn ) n >i is given by p 1 {x,y) = f Q pt.(x, y)j(dt) 
where p t {x,y) denotes the transition density of the diffusion (X t )t>o- The condition s > 5/2 
ensures that the transition density p 1 is bounded from below uniformly on [0,1] 2 . Indeed, 
p 1 {x,y) > Kj{I) > Ka , where K is the uniform lower bound on inf tg /p t obtained in [ 23 ], 
Proposition 9]. Since ||pt||oo ^ 1 + f -1 / 2 , the condition E 7 [t -1 / 2 ] < 00 ensures a uniform upper 
bound on p 1 . □ 
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To analyze the performance of a 2 , we first decompose its estimation error into a deterministic 
and a stochastic error term. In what follows, C = C(d, D , I, a) denotes a numeric constant which 
may vary from line to line. We deduce from the proof of Theorem [7] on the there defined event 
T 5 , that for any J € Jn 

||5j - ct 2 || L 2 <C (||/i - A*j|U 2 + ||ui - uj,i\\m + \v\ - uj,i|) 

<C(\\n - p,j\\ L * + ||lt! - + \K! - Kj t i| + |£ 7 (-ui) - £ 7 (-'Ui)|) 

<A/ + S Jt (29) 


where 


Dj :=C(||(/ - 7 Tj)/x || L 2 + ||ui - u Jy i\\ H i + |«i - Kj, i|), 

Sj :=C(||7rj/x - + \\uj,i - uj,i||jji + |kj,i - Kj, i| + |£ 7 (-wi) - <C 7 (-Ui)|). 

Due to the smoothness of the invariant measure, Jackson’s inequality and Proposition [T2l there is 
some 0 > 0, depending on ip, d and D such that 

Dj < 02~ Js . 

We need that Sj concentrates around zero. Recalling the definition of the residual vector 

r = (Rj — Rj)uj, 1 + kj,i(Gj — G j)uj\, 

Bernstein’s inequality and Theorem [2H1 o n generalized symmetric eigenvalue problems yield, on 
the event T 2 from Proposition [T71 that 

||uj,i - wj,lllir 1 + |«j,i - kj, i| < C2 J \\uj } i - uj,i||_l 2 + l^jp - Kjp\ < \\r\\ L 2{C2 J + l). 

Corollary 24. Under the conditions of Proposition \UM for any r > 1 there exist 771 , 772,773 > 1, 
such that, for all J with 2 J < ,, n , we have 


'<7,6,7 ( — ZvIU 2 > 2 2 771 ^ 


I log log N \ 


N J 


< 


(log N) T , 


P (T , 6 , 7 (||r || i 2 > 2**^*^) < (log N)~t, 

<(log N)~ T . 


ct, 6,7 ( |G 7 ( Ui) £ 7 ( 77l)| > 773 ^ 


I log log N \ 
~N ) 


(30) 

(31) 

(32) 


In particular, there is a A > 0 such that P (Ti 6 j7 (4>S'j > sj) < (log IV) T /or sj = sj( A) /rora (ZZP- 
Proof. Fix t > 1. Since ||^>a||oo < 2l A l/ 2 , for |A| < J, using Proposition [231 we obtain 


■ cr,b,7 


A,M — Miv)| > ^ exp ( 


f 7 y?jV(logloglV) 771 yW^log log IV) 

I 1V||Va||| 2 ’ (logiV)||V-A||oo J 


^ Z' ■ f n , Ar VWoglogfV) ^ 

< exp ( - C 77 ! mm ( log log N, (lQgJV)2J/2 } j 

< (log N)~ cr11 < (log N)~ T , 

for some 771 big enough. Applying a usual chaining argument, this concentration inequality carries 
over to max| A |<j \(ip\,p — pn)\, cf. Theorem 2.1] and [ 23 , Theorem 12]. Since || pj — pj ||| 2 = 
E|A|<J KVwm - Pn)\ 2 , it follows that 


<t,6,7 ( MjIIl 2 > 


2 0 J 1 °g 1 °g iV 


N 


< 


<7,6,7 ( max l(V’A,/7-/7jv)| 2 > 7?i 

\ |A| 


log log IV ^ 

N J 


< 


(log IV)' 
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To prove (EUl) . note first that since |kj,i| < 1, we have 

IMI L 2 < || (Rj ~ Rj)uj,l\\L 2 + || (Gj - Gj)uj t i|| L 2. 

By Proposition [T7] ||mj,i||l 2 , ||mj,i||oo < 1 holds for J big enough. Using the second inequality in 
Proposition [231 we obtain 

P CT ,6, 7 (|<Vw (Rj - Rj)uj,i)\ > r ]2 

( . f N(log log N) y/N(log log N)\\ at\-Cv 2 n aa-t 

£ exp ( - c V2 mm |---, (logjv)2J/2 j ) £ ( lo S N ) Z ( lo S N ) > 

for 772 big enough. Since ||(.Rj - Rj)ujj\\l 2 = J2\\\<j KV’A, (Rj - Rj)uj, i)| 2 , we conclude again 
that 

IP<t, 6,7 (ll (Rj - Rj)ujAl* > ^ l °^ N ) Z (log N)~ T . 

Arguing similarly we deduce also P CT , 6 , 7 ^||(Gj — Gj)ujAl 2 > ?? 2 2^y^ lo s^g 1L \ < (log N)~ T and 

thus m holds. 

The concentration inequality (1321) follows from the classical Bernstein inequality. Indeed, we 
have 

1 N 

£ 7 (-it) - jC 7 (—ui) = jj with := e VlAn -E 7 [e*’ lA "], 

n= 1 

where, by Assumption [L] the random variables £„ are independent, centered and deterministically 
bounded by 2 (because iq < 0). Since Var 7 (£„) < £ 7 (— 2vi) < 1, we can choose 773 uniformly for 
all 7 £ r. □ 

We can now prove the convergence rate for the adaptive estimator. 

Proof of Theorem 0 Let us introduce the oracle projection level 

J* := min {j £ J7jv : /32“ Js < Sj / 4}. 

By the choice of Jn we deduce 2 J * ~ (iV/log logAT) 1 ^ 2s+3 ) and Sj, ~ (loglog N/N) 2s R 2s+3 \ 
Since the number of elements in Jn is of order logIV, Proposition [231 yields P CT ,b l 7 (A/v) —> 1 for 
the event 

An '■= {VJ £ Jn '■ 4 Sj < sj} n Te 

with Te from the proof of Theorem[7| Due to the decomposition (f!2Ul) . on Aw we have for every J £ 
Jn- 

\\S 2 j - <J 2 \\ L 2 < Dj + Sj < (32~ Js + sj. 

Hence, for all J > J*, J £ Jn , we obtain 


\\a 2 j-A\\ L 2 [aM <-sj, 


and thus, by the triangle inequality, 


II Oj -5j. \\ L 2 [a,b] < sj , 

for all J > J*, J £ Jn- By definition of J, we conclude that J < J* on the event An- We 
conclude that 


—0 0 13 

W 2 - cr 2 \\ L 2 [a,b] < \\o\-^jA\L 2 [aM + l|0j. ~ V 2 \\L*[a,b] < Sj * + xSj* < -Sj.. □ 
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A Stability of the eigenvalue problems 

A.l Compact, self-adjoint, positive-definite operators 

Theorem 25. Consider T a compact, self-adjoint and positive-definite operator on some Hilbert 
space H = ( H , || • ||). Denote its eigenpairs by (A i,Xi ) i _ 1 2 , normalized so that ||xj|| = 1 and 
ordered decreasingly with respect to the eigenvalues. Let V C. H be a finite dimensional subspace 
of H, and tt the orthogonal projection on V. Assume that the biggest eigenvalue \\ is simple and 
that 

<^xT- 

Consider the projected operator ttTtt and denote its normalized , ordered decreasingly, eigenpairs 
h ■ Then 

|Ai — Af | + ||xi — x \|| < C ||(/ - 7r) xi\\ 

holds, where the constant C depends only on the size of the spectral gap Ai — A 2 and the first 
eigenvalue Ai. 

Proof. Since T is self-adjoint and positive-definite ||T|| = sup^g^ ^\\x\[Y = Ai. By the variational 
characterization of the eigenvalues 


f (' V,Ty } . {y,Ty) 

\ = sup mi - |19 - < sup ini^ n ll0 - = \. 


scv yes ||y|| 2 

dim(S)=i 


SC.H vcs || yr 
dim(S)=i 


(33) 


Furthermore 


Ai - Af < 


< 


((Ai — nTn) (7tcci) , 7rxi) (irT ( I — if) x\,ttx\) 

IK^if hxi \\ 2 

||7rT(J-7r)a;i|| < J|(J - tt) aq || 


< mi 


IMill 

||(I- tt)xi|| 


\TTX 1 \ 


1- ||(I-TT) Xi||' 

Since |Ai — Af | < 2||T||, from the inequality A 2 < 3z for z = ||(/ — n) aq|| follows that 

|Ai —Af| <3||T||||(/-7 t)x 1 ||. 

Since by holds A)f < A 2 and ||T|| ||(/ — 7 r) aq|| < Al ~ Aa we have 

[A] 7 — A 2 I A Af — A 2 = |Ai — A 2 I — |Ai — Af | 

> Ar — A 2 — 3||T|| ||(7 — 7r) Xi|| > Y (A, — A 2 ). 

Consequently the projected operator ttTtt has a spectral gap of size p > Al 2 Aa and in particular 
the eigenvalue \\ is simple. Define the residual vector r = (ttTtt — T)x\. Then 

||r|| = ||(7rT7r - Tjaq|| < ||7rT7rxi - ttTxi\\ + Ai ||7rxi - ®i|| 

< (||T|| + A,) ||(7 — 7r) xr||. 

Consequently, in order to prove llxi — xf II < C ||(J — tt) aq||, it suffices to justify that 


I VII ^ M n 

II - ^ 11*11 
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Let P be the spectral projection on the eigenspace of operator 7 tT7t corresponding to the eigen¬ 
value XX ■ Let R(nTn,z) = (nTn — z) -1 be the resolvent operator. Using Cauchy’s integral 
representation of the spectral projection (see Lemma 6.4 from [9() and |Ai — XX \ < p we find 


\\x 1 -Px 1 \\ = 


2n 




R (nTn, z] 


S(Ai,3p/2) 


< ylMI 


sup 


Ai — z 

(nTn, z 


dz (nTn — T) X\ 


zeS(Ai,3(0/2) 

Since operator nTn is self adjoint on R we know that (see Proposition 2.32 from 0 ) \\R(nTn,z)\\ = 


(dist (z, cr (nTn))) 1 . Consequently 


sup \\R(nTn,z)\\ = sup (dist (z, a (nTn))) 1 < ^. 

z€S(Ai,3p/2) zeS(Ai,3p/2) 2 


It remains to bound the distance between the eigenvectors. Since x\ and x\ are normalized 
— cci || 2 = 2 — 2(x\ , £i) < 2 — 2,(xX, X\) 2 

= 2 (l + (xXtXx)) (l - (xX,x{)) = 2 ||xi - (xX , x\)xX |p ■ 

Since Af is simple, the right hand side is equal to 2 ||xi — Pxi\\ 2 . □ 


A.2 Generalized symmetric eigenvalue problems. 

In this section we want to sketch the a posteriori technique of solving generalize d sy mmetric 
eigenvalue problems (GSEP). GSEPs have been studied extensively in chapter VI of [30[. For the 
error analysis in the case of standard matrix eigenvalue problems we refer to Chapter 1 of {9] or 
Chapter V of [30j. A particularly useful reference for various eigenvalue problems is [5]. 

Consider A, B £ R nxn real, symmetric matrices with B positive definite. We call a pair 
(\,x) £ R x (K™ \ {0}) an eigenpair of the generalized symmetric eigenvalue problem (GSEP) for 
matrices A, B if 

Ax = A Bx. (34) 

Furthermore we adapt the notation of the standard eigenvalue problems calling A the eigenvalue 

1 

and x the eigenvector. An eigenpair is normalized if ||x|| = 1, where ||x|| = ( x i ) 2 i s the 
Euclidean norm on R”. 

Using Cholesky decomposition of matrix B = DD* one can reduce the generalized problem 
(IMl) to the standard eigenvalue problem for matrix D~ 1 AD~*. We deduce that problem (TH1) has 
n solutions (Xi, xi)i=i,.., n , all eigenvalues are real and we can ordered the eigenpairs with respect 
to the eigenvalues Ai > A 2 > ... > A„. Furthermore corresponding eigenvectors (^i)i=i,..,n form a 
B— orthogonal basis of R". 

Consider now perturbed matrices A, B with B positive definite and the corresponding GSEP: 

Ax = X Bx. (35) 

We want to formulate error bounds between (Ai,Xi) and (Ai,xi). To that purpose form the 
residual vector 

r = Ax 1 — X\Bx\ = (A — A)x 1 + X±(B — B)x±. 

The standard a posteriori procedure is to find a matrix E = E(X\, xi) such that 

(A + E)x\ = Ai-Ba?i, (36) 

\\E\\ = ||r||. 

Since we replaced in (13(11) the perturbed matrix B by B, the final step is to reduce (1331) and (1331) 
to the standard eigenvalue problems using the Cholesky decomposition of B. Then we can apply 
the standard error bounds expressed in terms of the perturbation matrix E. We obtain 
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Theorem 26. There exists a normalized eigenpair (A i,Xi), l<i<n such that 


|A< - A x | < ||-B _1 || ||r||, 

2^2 k(B) , 


\Xi - Xi\\ < 


S(Xi) 


B - 1 ! 


where n(B) = ||B||||B 1 ||is the condition number of matrix B and S(Xi) is the so called localizing 


distance, i.e. S(Xi ) = min ^ 


X-i ~ Xi 


The disadvantage of the above procedure is that we obtain an existence result that gives no 
information how the eigenpair (Xi,xf) is related to (Ai,Xi). This is a typical downside for a 
posteriori methods that are supposed to provide information how far the calculated solution is 
from the nearest exact solution but are not intended to compare ordered eigenpairs. A helpful 
result is the absolute Weyl theorem for generalized hermitian definite matrix pairs, established by 
Y. Nakatsukasa [ 2 1.]. For readers convenience we state below the theorem in the form presented 
in 22, Theorem 8.3]. 


Theorem 27. Let X\ > ... > X n and X± > ... > X n be^respectively exact^and approximated 
eigenvalues of problems m\> and (E31. Denote A A = A - A and A B = B - B. Then 


A i 

< 

B - 1 

|AA — XiAB\\ 


< 

\b~ x 

I 

A A - XiAB 


for all i = 1,..., n. 
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